首页> 外文OA文献 >QuickLexSort: An efficient algorithm for lexicographically sorting nested restrictions of a database
【2h】

QuickLexSort: An efficient algorithm for lexicographically sorting nested restrictions of a database

机译:QuickLexsort:一种用于词典排序的高效算法   嵌套的数据库限制

摘要

Lexicographical sorting is a fundamental problem with applications tocontingency tables, databases, Bayesian networks, and more. A standard methodto lexicographically sort general data is to iteratively use a stable sort -- asort which preserves existing orders. Here we present a new method oflexicographical sorting called QuickLexSort. Whereas a stable sort basedlexicographical sorting algorithm operates from the least important to mostimportant features, in contrast, QuickLexSort sorts from the most important toleast important features, refining the sort as it goes. QuickLexSort firstrequires a one-time modest pre-processing step where each feature of the dataset is sorted independently. When lexicographically sorting a database,QuickLexSort (including pre-processing) has comparable running time to using astable sort based approach. For a data base with $m$ rows and $n$ columns, anda sorting algorithm running in time $O(mlog(m))$, a stable sort basedlexicographical sort and QuickLexSort will both take time $O(nmlog(m))$.However in many applications one has the need to lexicographically sort nesteddata, e.g.\ all possible sub-matrices up to a certain cardinality of columns.In such cases we show QuickLexSort gives a performance improvement of a logfactor of the database length (rows in matrix) over using a standard stablesort based approach. E.g.\ to sort all sub-matrices up to cardinality $k$,QuickLexSort has running time $O(mn^k)$ whereas a stable sort basedlexicographical sort will take time $O(mlog(m)n^k)$. After the pre-processingstep that is run only once for the entire matrix, QuickLexSort has a runningtime linear in the number of nested sub-matrices to sort. We conclude with anapplication to Bayesian network scoring to detect epistasis using SNP markerdata.
机译:字典排序是列联表,数据库,贝叶斯网络等应用程序的基本问题。按字典顺序对常规数据进行排序的标准方法是迭代使用稳定的排序-保留现有订单的排序。在这里,我们介绍了一种称为QuickLexSort的新的书目排序方法。相比之下,稳定的基于排序的词法排序算法可从最不重要的特征到最重要的特征进行操作,而QuickLexSort从最重要的特征到最不重要的特征进行排序,从而对排序进行完善。 QuickLexSort首先需要执行一次适度的预处理步骤,在该步骤中,将数据集的每个特征进行独立排序。按字典顺序对数据库进行排序时,QuickLexSort(包括预处理)的运行时间与使用基于不稳定排序的方法相当。对于具有$ m $行和$ n $列的数据库,以及运行时间为$ O(mlog(m))$的排序算法,基于稳定排序的词典排序和QuickLexSort都将花费时间$ O(nmlog(m)) $。但是,在许多应用程序中,需要按字典顺序对嵌套数据进行排序,例如,所有可能的子矩阵直到特定列的基数。在这种情况下,我们显示QuickLexSort的性能提高了数据库长度的对数因子(行数)。矩阵)。例如\要对所有子矩阵进行排序,直到基数$ k $,QuickLexSort的运行时间为$ O(mn ^ k)$,而基于稳定排序的字典排序将花费时间$ O(mlog(m)n ^ k)$。在仅对整个矩阵运行一次的预处理步骤之后,QuickLexSort的运行时间与要排序的嵌套子矩阵的数量成线性关系。最后,我们将其应用于贝叶斯网络评分以使用SNP标记数据检测上位性。

著录项

  • 作者

    Haws, David;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号